This assignment is for ETC5521 Assignment 1 by Team Brolga comprising of Dhruv Nirmal, Gui Gao, Krisanat Anukarnsakulchularp, and Alison Phan.
This paper seeks to explore the evolution of the popularity of games. We will be using two datasets on games from R for Data Science/ tidytuesday which is a weekly data project in R from Kaggle’s Board Games Geek contributed by David Baranger and Georgios Karamanis. A game is often played as a social form of entertainment. Gaming is considered the world’s favourite form of entertainment where the gaming industry has generated more revenue than TV, movies, and music (OppenheimerFunds, 2018). Not only do they transport us to new realities to satisfy our needs for recognition and achievements, but they also keep us interested and engaged due to their attention to detail (Barone, 2022). Over the years as the gaming industry is booming (Read, 2022), it is essential for gaming developers to target and identify factors that would satisfy and attract customers to buy their games. Today, with many games on the market, it has been increasingly difficult and competitive among companies as they face a new generation of buyers with different needs (Prystupa-Rządca & Starostka, 2022). As it is essential to stay on top of the latest insights from around the globe, we are motivated to investigate the factors that influence buyers’ behaviour regarding their gaming preferences and purchases to assist companies in improving their future game products and profitability. We aim to understand factors that affect and influences buyers psychologically.
There are two datasets: ratings and details. The two datasets are derived from Kaggle by way of Board Games Geek, with a hattip to David and Georgios.
The ratings dataset contains information about the game, average rating, rank and theoretical rating average (bayes average). Every game’s url and thumbnail link is also provided. It was collected over a period of 100 AD to 2022. The data set size is around 4.8MB, contains 21,631 observations and 10 columns. The details data set size is approximately 32.7MB, contains 21,831 observations and 23 columns.
| variable | class | description |
|---|---|---|
| num | double | Game number |
| id | double | Game ID |
| name | character | Game name |
| year | double | Game year |
| rank | double | Game rank |
| average | double | Average rating |
| bayes_average | double | Bayes average rating |
| users_rated | double | Users rated |
| url | character | Game url |
| thumbnail | character | Game thumbnail |
The details dataset provides information about the type of board game, game mechanics, who was the artist, creator, and publisher of the game. It provides an extensive detail about the game such as the minimum and maximum number of players who can play the game, minimum age, how the game is played and is categorised by the type of game such as building, dice, drinking etc. The data collected is over a period of 400 AD to 2023.
| variable | class | description |
|---|---|---|
| num | double | Game number |
| id | double | Game ID |
| primary | character | Primary name |
| description | character | Description of game |
| yearpublished | double | Year published |
| minplayers | double | Min n of players |
| maxplayers | double | Max n of players |
| playingtime | double | Playing time in minutes |
| minplaytime | double | Min play time |
| maxplaytime | double | Max plat tome |
| minage | double | minimum age |
| boardgamecategory | character | Category |
| boardgamemechanic | character | Mechanic |
| boardgamefamily | character | Board game family |
| boardgameexpansion | character | Expansion |
| boardgameimplementation | character | Implementation |
| boardgamedesigner | character | Designer |
| boardgameartist | character | Artist |
| boardgamepublisher | character | Publisher |
| owned | double | Num owned |
| trading | double | Num trading |
| wanting | double | Num wanting |
| wishing | double | Num wishing |
filter rows: There is yearpublished that is less than zero, so we have filtered this out by using the filter function.
creating variable: We have created a new variable world_year using mutate to indicate the major events that happened during that year.
join: Our data comes from two datasets, however some have the same variables, so we used the inner_join function to merge them for analysis.
separate rows: A board game may have multiple game mechanics, separate_rows function was used to split each game mechanic by comma and then calculated.
remove special characters: There are special symbols inside the strings, so gsub function was used to delete all the special symbols.
remove NA rows: Among the variables we select, some NA values affect our plotting results, so na.omit function was used to delete NA rows.
The methods used in this analysis are time series analysis, text wrangling, correlation analysis, and clustering. The time series analysis is used to look at how time changes have affected the data over time. Text wrangling is used to clean the data into a tidy format. Correlation analysis is used to look at the relationship between the variable of interest and lastly cluster analysis is used to look at how the observations are grouped together.
Q1. How board game ratings change by year of publication?
Q2. Trends in game mechanics over time.
Q3. Trends in board game publication rates over time.
Q4. What are the common game mechanics and their changes in prevalent?
Q5. What types of games are becoming increasingly popular?
Q6. How many games have each listed how many mechanics?
Q7. What is the distribution of ratings for the board games?
Q8. Is there a linear relationship between the year of release of a game and the average rating it receives?
Q9. Are Board Game Descriptions more positive or negative?
Q10. What are the 10 most common words used in board game descriptions?
Q11. Is there any relationship between average game time and ownership, min/max players?
Q12. How does number of people who wish to own the game and who own the game plot against each other?
Q13. How does Rank and average rating plot against each other. (Highly ranked games should have more rating)
Q14. Which age group prefers which type of game eg medical , building etc.? eg. younger children might be interested in building games etc and teenagers might be interested in games like Monopoly
Q15. Are people intrigued by a particular game designer?
Q16. How did the era of electronic games(starting from 2010-11) affect the number of people who want/wish/own board games?
Q17. Is it right to assume that a game board publisher publishes a single type of board game and do the game publishers only focus on a certain age group?
Q18. Are people intrigued by a particular game publisher?
Q19. How accurate has bayes average been?
Q20. Which era or decade played a big role in people playing more board games?
Q21. Games developed in the early years are more in demand despite their low ratings, people still collect it rather than play. Game artworks may influence people’s wishes to own the game despite the game being uninteresting to play. Is this statement true and why is it the case?
Q22. The higher the ratios of people who wish to the own game and those who own the game could indicate the popularity of the game regardless of the minimum age, duration of the games, years publication and the game characteristics. Explore the factors that influence the ownership of the games.
Q23. Given the number of games has been increasing, does this affect the game, e.g., rating, mechanics, playing time?
Q24. Since the computer is more accessible, people are shifting to online games. How does this affect buyer behaviour?
Q1. Games developed in the early years are more in demand despite their low ratings, people still collect it rather than play. Game artworks may influence people’s wishes to own the game despite the game being uninteresting to play. Is this statement true and why is it the case?
Q2. The higher the ratios of people who wish to the own game and those who own the game could indicate the popularity of the game regardless of the minimum age, duration of the games, years publication and the game characteristics. Explore the factors that influence the ownership of the games.
Q3. Given the number of games has been increasing, does this affect the game, e.g., rating, mechanics, playing time?
Q4. Since the computer is more accessible, people are shifting to online games. How does this affect buyer behaviour?
Q1. Games developed in the early years are more in demand despite their low ratings, people still collect it rather than play. Game artworks may influence people’s wishes to own the game despite the game being uninteresting to play. Is this statement true and why is it the case?
The statement “Games developed in the early years are more in demand despite their low ratings, people still collect it rather than play” may be true as consumers tend to buy fewer games compared to past decades, however, they are spending more time with those games indicating that games were in much higher demand in the past (Brambilla Hall, 2022). The other statement “Game artworks may influence people’s wishes to own the game despite the game being uninteresting to play”, may be true as colour and lines have been known to create human wants and desire (Zeller, 2022).
Q2. The higher the ratios of people who wish to the own game and those who own the game could indicate the popularity of the game regardless of the minimum age, duration of the games, years publication and the game characteristics. Explore the factors that influence the ownership of the games.
Studies have shown that games exploded in popularity when consumers tried to keep themselves entertained while being forced to stay at home (Epstein, 2022). As games are often considered a form of entertainment and an escape from boredom. During periods such as the recent pandemic and consequent lockdowns, there would be an increase in game ownership.
Q3. Given the number of games has been increasing, does this affect the game, e.g., rating, mechanics, playing time?
As consumers have less leisure time, it is expected that the time limit for games would be adjusted and decreased to incorporate people’s lifestyles. For the ratings, the average rating would be lower as there are more choices of games for people to play. This means that there are more options for consumers to compare games. As for game mechanics, consumers in different eras would have different preferred game types.
Q4. Since the computer is more accessible, people are shifting to online games. How does this affect buyer behaviour?
It is expected that the number of people who owned board games would decrease circa 2012 when a popular mobile game such as subway surfers, temple run was released. It is also expected that when COVID-19 hit in 2020, the number of people who owned games would decrease.
Q1. Games developed in the early years are more in demand despite their low ratings, people still collect it rather than play. Game artworks may influence people’s wishes to own the game despite the game being uninteresting to play. Are these statements true and why is it the case?
| world_year | n |
|---|---|
|
133 |
|
11114 |
|
1575 |
|
115 |
|
2 |
|
52 |
|
15 |
|
155 |
|
1296 |
|
1450 |
|
4958 |
|
582 |
Figure 6.1: Distribution of the bayes average rating
Figure 6.2: Yearly bayes average mean
Figure 6.3: Relationship between number of people who wanting and wishing the game
Figure 6.4: Conditional proportion
Figure 6.5: Conditional proportion in 2000s
| boardgameartist | n |
|---|---|
| Franz Vohwinkel | 186 |
| Redmond A. Simonsen | 150 |
| Michael Menzel | 105 |
| Dennis Lohausen | 87 |
| Harald Lieske | 80 |
| Rodger B. MacGowan | 78 |
| Klemens Franz | 74 |
| Oliver Freudenreich | 73 |
| Néstor Romeral Andrés | 55 |
| Doris Matthäus | 50 |
Figure 6.6: Relationship between mean of Number of owned and mean of average rating
## Dim.1 Dim.2 cluster boardgameartist
## 1 -0.1982177 -0.12866403 2 Alexander Jung
## 2 2.3706198 -0.72648295 1 Alexandre Roche
## 3 -1.1072475 -0.03740996 3 Amelia Sales
## 4 -0.8112835 -0.07605648 3 Anne Pätzke
## 5 1.3319649 -0.74539343 2 Arnaud Demaegd
## 6 -0.9828034 0.35917227 3 Bernhard Skopnik
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 1.9 93 93
## Dim.2 0.1 7 100
Figure 6.7: Cluster analysis
Due to the limited data availability, we will only be focusing from 1914 (start of World War I) to 2022. By using the bayes average for the trend for rating 6.2, we can see that the fluctuation has decreased significantly. We can also see that 6.3 “wanting” and “wishing” are positively correlated. By combining “wanting” and “wishing” together 6.4, we would be able to have a representation of the games that are in demand. Ratings that are below average 5.68 based on bayes mean is considered as a low rating game. Despite low ratings 6.5, there is only a slight decrease in games that are owned in recent years, especially after Global Financial Crisis where it has reflected higher in demand. Games that were published earlier had slightly lower ratings however they were not in high demand. This means that our first statement where “games developed in the earlier years are more in demand despite their low ratings, people still collect it rather than play” is incorrect.
For this analysis, we will be focusing only the top 70 games with a single artist. Based on the plot 6.6 for “average own” and “bayes average”, we can see that games from Chris Quilliams and Miguel Coimbra have high number of owned and ratings that are higher than 6. Interestingly, although the ratings for Cyril Bouquet is below 6, the games owned is more than 6000. This could indicate that the artwork is attractive, and people own it as a collection even though it might not be interesting to play. When doing clustering analysis 6.7, we also see that the artist can be grouped into 3 clusters. With the cluster 1 (red), being considered as high quality artwork and can be kept as collection. This means that our second statement where “Game artworks may influence people’s wishes to own the game despite the game being uninteresting to play” is true.
Q2. The higher the ratios of people who wish to the own game and those who own the game could indicate the popularity of the game regardless of the minimum age, duration of the games, years publication and the game characteristics. Explore the factors that influence the ownership of the games?
Figure 6.8: Relationship between popularity and rating average for each time period
Figure 6.9: Relationship between popularity and rating average for each time period except the second world war
Figure 6.10: Number of game for each game mechanic in each time period
Figure 6.11: T-test for popularity
Figure 6.12: Relationship between minimum age against popularity
Figure 6.13: Relationship between playing time against popularity
Figure 6.14: Relationship between maximum players against popularity
Figure 6.15: Residuals plot
For this analysis, we have defined popularity by summing the “wanting” and “wishing” and then dividing it by the “owned” to get an indication of the board game popularity. Games during World War II is excluded in our analysis as they are outliers. Based on 6.8, interestingly during World War I, before World War II and during Covid-19, board games have gained popularity. Limit to single category game type 6.10, card game and abstract strategy are the top two most popular game categories. Intriguingly, during the global financial crisis and post-global financial crisis, abstract strategy type game is more popular compared to the card game, even though card game is usually the most popular regardless of the period.
Due to the limited data in certain periods, we have removed games published in periods during World War II, Baby Boomer and Before the Internet when performing a one-sided t-test to understand if the popularity is statistically significant. The result showed that all periods are significant, except the period when the Internet was born.
Multiple linear regressions were fitted to the data for different periods. Based on the residuals 6.15, the models did not fit well with the games published during World War I, before World War II and during Covid-19. This suggests that when the world is not at peace, the board game’s popularity could fluctuate a lot. “Playing time” 6.13 and “maxplayers” 6.14 both seem to be important factors that have contributed to the game’s popularity in recent few years (i.e. Post Covid-19). In contrast, “minage” 6.12 does not seem to be relevant to popularity.
Q3. Given the number of games has been increasing, does this affect the game, e.g., rating, mechanics, playing time?
We would like to only look at post COVID-19 data. The reason for this is that the lock down could impact the number of game created.
Figure 6.16: Number of games over time
Figure 6.17: Average rating over time
Figure 6.18: Most Prevalent Mechanics in the 1990s and 2010s
Figure 6.19: Game playing time over time
From the figure 6.17, we can see that the average rating seems to be higher than before, which is different from the expected findings. This could be due to better development of board games in today’s market, where there are more innovative creations of board games.
According to the figure 6.18, we can find that the most common game mechanic in both 1990s and 2010s is *Dice Rolling. **Dice Rolling* itself is a mechanic that can be used in many games. It has been around for a long time, and in ancient times, dices were made of stones, clay, bones, etc. to play the game, so Dice Rolling is often seen as the most dominant symbol of board games (Sofiia & Joseph Alexander, 2017). In the figure 6.19, to reduce the skewness of the original data, the log transformation was used. Based on the observation, there is no relationship between the playing time and the yearpublished this concludes that the playing time neither decreases nor increases. This is perhaps because, the board game designer thinks the playing time is already optimal does not need to be changed.
Q4. Since the computer is more accessible, people are shifting to online games. How does this affect buyer behaviour?
Figure 6.20: The buyer behaviour over time
From the figure 6.20, we can see that number of people “trading” and “owning” the game has decreased since 2015, while the number of people “wishing” and “wanting” has decreased after 2017. Board games were still in demand despite online games being introduced although it was expected that the downfall of board games would happen earlier given the rapid advancement in technology. The increasing trend before the peak is likely due to the rise of the internet, which allowed people to discover new board games (Sargeantson, 2022). The steep drop after 2020 may have been a result of the impact of COVID-19, where lockdown affected supply chain systems, which reduced the number of games that can be shipped.
With our data exploration, we can see that low rating games were not in demand in early years. However, game artworks play a role in influencing buyers to make a purchase. Gaming companies are likely to make good profit during periods where people are bored and want an escape from their surroundings especially during periods where they are in isolation such as lock downs during COVID-19. Since there is no relationship between the playing time and the year published, there is no need for game developers to focus on changing time limits for games to adjust to people’s lifestyles. Game developers should focus on inventing new innovative creations of board games to maximize their sales and profitability. Gaming companies should take note on the difficulties of shipping their games during periods such as COVID-19 and come up with back up plans to avoid losing potential sales.
Alboukadel Kassambara (2020). ggpubr: ‘ggplot2’ Based Publication Ready Plots. R package version 0.4.0. https://CRAN.R-project.org/package=ggpubr
Alboukadel Kassambara and Fabian Mundt (2020). factoextra: Extract and Visualize the Results of Multivariate Data Analyses. R package version 1.0.7. https://CRAN.R-project.org/package=factoextra
Barone, R. (2022). Video games are fun. Here’s why, and how they hook us. https://www.idtech.com/blog/what-makes-video-games-fun#:~:text=Why%20are%20video%20games%20fun,to%20their%20attention%20to%20detail.
Brambilla Hall, S. (2022). COVID-19 is taking gaming and esports to the next level. World Economic Forum. https://www.weforum.org/agenda/2020/05/covid-19-taking-gaming-and-esports-next-level/.
C. Sievert. Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC Florida, 2020.
David Robinson, Alex Hayes and Simon Couch (2022). broom: Convert Statistical Objects into Tidy Tibbles. R package version 0.7.12. https://CRAN.R-project.org/package=broom
Epstein, A. (2022). Game on: How COVID-19 became the perfect match for gamers. World Economic Forum. https://www.weforum.org/agenda/2020/09/covid19-coronavirus-pandemic-video-games-entertainment-media/.
Gagolewski M (2021). stringi: Fast and portable character string processing in R. R package version 1.7.6, <URL: https://stringi.gagolewski.com/>.
Gagolewski M (2021). “stringi: Fast and portable character string processing in R.” Journal of Statistical Software. to appear.
H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
Hadley Wickham (2019). stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.4.0. https://CRAN.R-project.org/package=stringr
Hadley Wickham and Maximilian Girlich (2022). tidyr: Tidy Messy Data. R package version 1.2.0. https://CRAN.R-project.org/package=tidyr
Hadley Wickham, Romain François, Lionel Henry and Kirill Müller (2022). dplyr: A Grammar of Data Manipulation. R package version 1.0.8. https://CRAN.R-project.org/package=dplyr https://www.weforum.org/agenda/2022/07/gaming-pandemic-lockdowns-pwc-growth/#:~:text=The%20video%20game%20sector%20is,exceed%20%24320%20billion%20by%202026. https://zellerhausart.com/blog/how-artwork-influences-purchase-decisions.
Kassambara A, Mundt F (2020). factoextra: Extract and Visualize the Results of Multivariate Data Analyses. R package version 1.0.7, https://CRAN.R-project.org/package=factoextra.
Kirill Müller and Hadley Wickham (2021). tibble: Simple Data Frames. R package version 3.1.6. https://CRAN.R-project.org/package=tibble
Lionel Henry and Hadley Wickham (2020). purrr: Functional Programming Tools. R package version 0.3.4. https://CRAN.R-project.org/package=purrr
OppenheimerFunds (2018). Investing in the Soaring Popularity of Gaming. https://www.reuters.com/article/sponsored/popularity-of-gaming.
Pedersen T (2022). patchwork: The Composer of Plots. R package version 1.1.2, https://CRAN.R-project.org/package=patchwork.
Prystupa-Rządca, K., & Starostka, J. (2022). Customer Involvement in the Game Development Process. https://jemi.edu.pl/vol-11-issue-3-2015/customer-involvement-in-the-game-development-process.
Read, S. (2022). Gaming is booming and is expected to keep growing.
Sargeantson, E (2022). Why Board Games Are So Popular. https://mykindofmeeple.com/why-are-board-games-popular/
Tierney N (2017). “visdat: Visualising Whole Data Frames.” JOSS, 2(16), 355. doi:10.21105/joss.00355https://doi.org/10.21105/joss.00355, http://dx.doi.org/10.21105/joss.00355.
Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686https://doi.org/10.21105/joss.01686.
Yihui Xie (2022). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.39.
Yihui Xie (2015) Dynamic Documents with R and knitr. 2nd edition. Chapman and Hall/CRC. ISBN 978-1498716963
Yihui Xie (2014) knitr: A Comprehensive Tool for Reproducible Research in R. In Victoria Stodden, Friedrich Leisch and Roger D. Peng, editors, Implementing Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978-1466561595
Young, A., & Harrington, J. (2022). World history: These are among the most important global events to happen annually since 1920. https://www.usatoday.com/story/money/2020/09/06/the-worlds-most-important-event-every-year-si nce-1920/113604790/
Zeller, P. (2022). How Artwork Influences Purchase Decisions.
Zhu H (2021). kableExtra: Construct Complex Table with ‘kable’ and Pipe Syntax. R package version 1.3.4, https://CRAN.R-project.org/package=kableExtra.